Minimum Number of Deletions and Insertions
Let's solve the Minimum Number of Deletions and Insertions problem using Dynamic Programming.
Statement#
Given two strings, str1 and str2, find the minimum number of deletions and insertions required to transform str1 into str2.
Note: This problem has a direct application in the autocorrect feature. It is also used in bioinformatics to quantify the similarity between two DNA sequences.
For example, if we want to transform str1 “pqr” into str2 “tqr”, we need to delete “p” and insert “t” in str1. Therefore, the minimum number of deletions and insertions required are:
- Deletions:
- Insertions:
Constraints:
-
str1.length,str2.length str1andstr2are in the same case English letters.
Examples#
No. | str1 | str2 | Deletions | Insertions |
1 | heap | pea | 2 | 1 |
2 | passport | ppsspt | 3 | 1 |
3 | bed | read | 1 | 2 |
Try it yourself#
Implement your solution in the following playground.
Note: If you clicked the “Submit” button and the code timed out, this means that your solution needs to be optimized.
Hint: Use dynamic programming and see the magic.
Solution#
We will first explore the naive recursive solution to this problem and then see how it can be improved using the Longest Common Substring dynamic programming pattern.
Naive approach#
A naive approach is to process all the characters of each string, one by one. Starting from the first character of both strings, there are three possibilities for every pair of characters being considered:
-
The first characters of both the strings match, in which case, we increment the count of the matching characters by 1. We move one position ahead in both strings, and the function is called recursively on the remaining strings.
-
The first characters of both the strings do not match, in which case, the following possibilities are checked on both strings:
- A recursive call by moving one position ahead in the
str1 - A recursive call by moving one position ahead in the
str2 - Return the maximum of the matching characters found by both the calls
- A recursive call by moving one position ahead in the
-
If we reach the end of either of the two strings, we return .
After finding the n (maximum number of the matching characters subsequence), we can find the number of deletions required on str1 to transform it into str2 by subtracting n from the length of str1. The number of insertions required on str1 to transform it into str2 can be found by subtracting n from the length of str2.
Let’s implement the algorithm as discussed above:
Note: Please observe that if you include the test case commented out in the driver code, the solution is likely to time out. Try to solve the larger problem using the dynamic programming solutions provided below and see the difference.
Time complexity#
The time complexity of the naive approach is exponential, i.e., , where is the length of the longest string.
Space complexity#
As the maximum depth of the recursive call tree is , and each call stores its data on the stack, the space complexity of this approach is , where is the length of the longest string.
Optimized Solution using dynamic programming#
Now, let us improve the recursive solution using top-down and bottom-up approaches.
Top-down solution#
The top-down solution, commonly known as the memoization technique, is an enhancement of the recursive solution. It overcomes the problem of calculating redundant solutions over and over again by storing them in a table. In the naive approach, it computes the maximum number of matching characters subsequence many times. Therefore, we need to memoize by using a 2-D table, lookup_table. This table is initialized using the lengths of both strings in the min_del_ins function.
The basic algorithm works exactly the same as the one in the naive approach. The only difference is that for each character, the value of the maximum number of matching characters is stored in the lookup_table.
-
If the start characters match, the value returned is stored at
lookup_table[i][j]. -
If the start characters don’t match, in which case, the following possibilities are checked on both strings:
- A recursive call by moving one position ahead in the
str1 - A recursive call by moving one position ahead in the
str2 - Store the maximum of the matching characters found by both the calls in the
lookup_table[i][j]and return it
- A recursive call by moving one position ahead in the
Realize that since the values returned are being stored in the lookup_table, the next time the function is called with the same arguments, the recursive call won’t be made. Instead, the value will be directly returned from the lookup_table.
Let’s implement the algorithm as discussed above:
Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.
Time complexity#
As we avoided redundant calculations by storing all the intermediate results in a 2-D table, the time complexity of this approach is reduced to , where is the length of the longest string.
Space complexity#
We now need space in memory to store intermediate values.
Bottom-up solution#
The bottom-up solution, also known as the tabulation technique, is an iterative method of solving dynamic programming problems. In this solution, we will again make use of a 2-D table, lookup_table, that stores the maximum number of matching characters subsequence required to find the minimum number of deletions and insertions for converting str1 into str2.
In this case, we build the solution bottom-up by dividing our problem into subproblems.
-
We first make a 2-D table
lookup_table[m+1][n+1]where is the length ofstr1and is the length ofstr2. This table is initialized with s. We need the first row and column to be for the base case. Any entry in this table given bylookup_table[i][j]is the maximum number of matching characters subsequence betweenstr1up till position andstr2up to the position. -
If the characters match, we store the returned values from
lookup_table[i-1][j-1]inlookup_table[i][j]. -
For characters that don’t match, we need to take a maximum of two subproblems:
- Move one position ahead in
str1(the subproblemlookup_table[i-1][j]) - Move one position ahead in
str2(the subproblemlookup_table[i][j-1])
- Move one position ahead in
-
In the end, we have the optimal answer for the maximum number of matching characters subsequence for
str1andstr2in the last position, i.e.,lookup_table[m][n].
With this, we can easily calculate the minimum number of deletions and insertions required to transform str1 into str2
Let’s look at the following illustration to get a better understanding of the solution:
1 of 16
2 of 16
3 of 16
4 of 16
5 of 16
6 of 16
7 of 16
8 of 16
9 of 16
10 of 16
11 of 16
12 of 16
13 of 16
14 of 16
15 of 16
16 of 16
Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.
Time complexity#
As we created a 2-D table to store the results of subproblems that would be used later on. Therefore, the time complexity of this approach will be .
Space complexity#
We need space in memory to store the intermediate values.
Shortest Common Supersequence
Edit Distance Problem